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1  Introduction 


Half-life  studies  of  environmental  contaminants  in  humans  are  generally  restricted  to  only 
a  few  measurements  taken  after  the  initial  exposure.  The  initial  dose  is  usually  unknown 
because  the  exposure  occurred  before  the  environment  was  known  to  be  contaminated.  We 
assume  that  a  one  compartment  first  order  decay  model  with  decay  rate  A  holds  for  subjects 
with  body  burden  above  the  background  level  determined  by  a  threshold  c.  Subjects 
are  included  in  the  study  only  if  their  body  burden  is  greater  than  c.  The  threshold  is 
defined  to  be  a  high  quantile,  such  as  the  99th  percentile,  of  the  biomarker  distribution  m  a 
control  population.  We  assume  that  the  concentration  of  the  contaminant  is  log-normally 
distributed  which,  together  with  the  first  order  decay  model,  implies  a  repeated  measures 
linear  model  relating  the  logarithm  of  the  concentration  and  time,  with  slope  -  A. 

Based  on  this  first  order  decay  model  and  data  for  36  Ranch  Hand  veterans  whose 
1982  and  1987  2,3,7,8-tetrachlorodibenzo-p-dioxin  (dioxin)  measurements  were  above  10 
parts  per  trillion  (ppt),  Pirkle  et  al.  (1989)  obtained  a  median  dioxin  half  life  of  7.1  years 
with  a  95%  confidence  interval  of  5.8  to  9.6  years.  Michalek  et  al.  (1992)  used  a  repeated 
measures  linear  model  to  investigate  the  effect  of  the  percentage  of  body  fat  (PBF)  on  the 
decay  rate  of  dioxin.  Using  weighted  least-squares  (WLS)  estimates,  they  found  a  border¬ 
line  significant  association  between  the  decay  rate  and  PBF,  with  the  decay  rate  of  lean 
subjects  being  greater  than  the  decay  rate  of  obese  subjects.  Utilizing  the  same  repeated 
measures  linear  model  and  data  collected  in  1982,  1987  and  1992,  in  veterans  with  more 
than  10  ppt  body  burden,  Michalek  et  al.  (1996a)  obtained  an  estimate  of  the  decay  rate 
using  SAS  PROC  MIXED  and  used  it  to  obtain  an  estimate  of  the  half-life  which  was 
corrected  for  bias.  In  fact,  they  obtained  the  expression  for  the  bias  in  the  estimate  of  the 
decay  rate  utilizing  the  conditional  normal  moments  from  Tallis  (1961).  They  showed  that 
when  the  data  are  truncated  above  a  line  with  a  slope  —A,  the  WLS  estimate  of  the  decay 
rate  becomes  unbiased.  Their  investigation  has  been  carried  out  under  AR(1)  and  Toeplitz 
assumptions  for  the  within  subject  covariance  matrices.  However,  their  procedure  did  not 
account  for  covariates  such  as  PBF  or  age.  Thus,  it  would  be  of  interest  to  develop  an  es¬ 
timate  of  the  decay  rate  adjusted  for  covariates.  It  would  also  be  of  interest  to  develop  an 
estimate  of  the  decay  rate  using  a  repeated  measures  linear  model  based  on  truncated  data. 
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In  this  report  we 


•  develop  a  WLS  estimate  of  the  decay  rate  based  on  a  repeated  measures  Unear  model 
and  correct  it  for  bias  assuming  a  general  k,  where  k  represents  the  number  of  repeated 
measures  per  subject. 

•  develop  a  WLS  estimate  of  the  decay  rate  in  the  above  setting  in  the  presence  of  a 
categorical  covariate. 

•  develop  a  maximum  likelihood  (ML)  estimate  of  the  decay  rate  in  the  presence  of 
truncation. 

In  Report  I,  we  present  the  repeated  measures  linear  model  and  derive  a  closed  form 
expression  for  the  WLS  estimate  of  the  decay  rate.  As  is  weU  known,  WLS  estimates  are 
biased  if  the  subjects  are  selected  based  on  a  high  or  low  value  of  the  response  variable. 
We  show  that  by  properly  adjusting  the  data,  the  estimate  of  A  can  be  made  unbiased.  In 
addition  we  obtain  expressions  for  other  parameter  estimates  to  facilitate  a  study  of  their 
statistical  properties. 

In  Report  II,  we  present  the  repeated  measures  Unear  model  which  contains  a  cat¬ 
egorical  covariate  and  its  interaction  with  time  and  derive  closed  form  expressions  for  the 
WLS  estimates  of  the  associated  parameters.  Some  special  cases  of  the  covariance  structure 
are  considered.  We  obtain  expressions  for  the  bias  in  the  estimates  of  the  regression  param¬ 
eters  under  the  condition  that  the  body  burden  of  dioxin  in  the  subjects  included  in  the 
study  is  greater  than  a  threshold  c.  The  estimates  are  then  made  unbiased.  The  resulting 
estimates  of  the  half-life  are  computed  using  Air  Force  Health  Study  (AFHS)  data. 

In  Report  III,  we  derive  ML  estimates  of  the  parameters  under  the  assumption 
that  the  logarithms  of  dioxin  measurements  at  the  k  time  points  for  each  subject  have 
a  truncated  multivariate  normal  distribution.  The  estimates  are  obtained  by  an  iterative 
process  similar  to  the  estimation-maximization  (EM)  algorithm.  The  WLS  estimates  serve 
as  initial  estimates  in  the  truncated  case.  An  estimate  of  the  asymptotic  covariance  matrix 
of  the  resulting  estimators  is  also  derived,  which  can  be  used  to  construct  the  asymptotic 
confidence  intervals  for  the  parameters. 
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REPORT  I 


Unbiased  Estimation  of  Regression  Parameters 
Adjusted  for  Bias  Due  to  Left  Truncation1 


1This  research  was  carried  out  in  collaboration  with  Dr.  Kishan  G.  Mehrotra,  Syracuse  University, 
under  the  University  Resident  Research  Program  of  the  Air  Force  Office  of  Scientific  Research. 
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Abstract 


Least-squares  estimates  of  regression  parameters  are,  in  general,  unbiased.  However,  if  the 
observations  on  the  response  variable  are  truncated  then  these  estimates  become  biased 
due  to  truncation.  For  example,  half-life  studies  of  environmental  contaminants  are  based 
on  only  a  few  measurements  per  subject  taken  after  the  initial  exposure.  The  subjects 
are  included  in  the  study  only  if  their  body  burden  is  above  a  threshold  c.  It  is  assumed 
that  the  first  order  decay  model  with  one  compartment  holds  which  leads  to  a  repeated 
measures  linear  model  relating  the  logarithm  of  the  contaminant  concentration  with  time 
and  other  covariates.  The  negative  of  the  coefficient  of  time  represents  the  decay  rate  (A), 
with  the  half-life  of  the  contaminant  given  by  tl/2  =  ln(2)/A.  The  usual  WLS  estimate  of 
A  is  biased  due  to  regression  to  the  mean.  We  show  that  the  regression  parameter  can  be 
made  unbiased  by  properly  selecting  the  subjects  under  study.  This  selection  results  in  a 
small  loss  in  the  sample  size  but  generally  improves  the  mean-squared  error  of  the  estimate. 
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1  Introduction 


Half-life  studies  of  environmental  contaminants  in  humans  are  generally  restricted  to  only 
a  few  measurements  taken  after  the  initial  exposure.  The  initial  dose  is  usually  unknown 
because  the  exposure  occurred  before  the  environment  was  known  to  be  contaminated. 
We  assume  that  a  one  compartment  first  order  decay  model  with  decay  rate  A  holds  for 
subjects  with  body  burden  above  a  background  level  determined  by  a  threshold  c.  Subjects 
are  included  in  the  study  only  if  their  body  burden  is  greater  than  c.  The  threshold  is 
defined  to  be  a  high  quantile,  such  as  the  99th  percentile,  of  the  distribution  of  the  body 
burden  of  the  contaminant  in  a  control  population.  We  assume  that  the  concentrations 
are  log-normally  distributed  which,  together  with  the  first  order  decay  model,  implies  a 
repeated  measures  linear  model  relating  the  logarithm  of  the  concentration  and  time,  with 
slope  -  A.  It  is  well  known  that  the  WLS  estimate  of  A  is  biased  due  to  regression  to  the 
mean  because  of  the  way  the  subjects  were  included  in  the  study.  It  has  been  shown  that 
(see  Michalek  et  al.  (1996a))  if  the  data  are  properly  conditioned,  the  WLS  estimate  of 
A  can  be  made  unbiased.  This  process  is  appealing  because  unbiased  estimates  can  be 
obtained  through  the  commercially  available  packages  such  as  SAS.  However,  their  results 
were  restricted  to  special  cases  of  the  underlying  covariance  structures  and  small  values  of 
k.  In  particular,  they  have  shown  that  if  the  covariance  matrix  is  AR(1)  and  k  =  3,  then 
the  samples  can  be  adjusted  such  that  estimate  of  A  is  unbiased.  This  can  be  achieved  by 
using  SAS  without  investing  in  any  special  purpose  computer  program. 

In  this  paper  we  generalize  the  result  of  Michalek  et  al  (1996a)  for  any  dimension 
and  show  that  the  samples  can  be  adjusted  such  that  the  WLS  estimate  of  A  is  unbiased. 
In  addition  we  obtain  expressions  for  other  parameter  estimates  to  facilitate  study  of  their 
statistical  properties. 

In  section  2,  we  present  the  repeated  measures  Unear  model  and  derive  closed  form 
expressions  for  the  WLS  estimates.  Some  special  cases  are  considered  in  section  3.  In 
section  4  we  obtain  an  expression  for  the  bias  in  the  estimate  of  the  regression  parameter 
under  the  condition  that  the  body  burden  in  the  subjects  included  in  the  study  is  greater 
than  a  threshold  c.  The  estimates  are  then  made  unbiased. 
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2  Model  and  Analysis 


We  assume  that  k  observations  were  taken  per  subject  for  each  of  n  subjects.  These  subjects 
were  exposed  to  a  contaminant  that  produced  an  elevation  in  the  body  burden  greater  than 
a  background  level.  Suppose  that  C0  denotes  the  initial  (unknown)  concentration  and  Ct 
denotes  the  concentration  t  years  after  the  exposure.  Then  a  first-order  kinetic  model 

Ct  =  C0e~M  C1) 

holds  in  the  subjects  with  body  burdens  above  a  threshold  c  and  A  denotes  the  (unknown) 
constant  decay  rate.  Based  on  equation  (1),  the  population  half-life  is  given  by 


By  taking  the  natural  logarithm  of  equation  (1),  we  obtain 

In (Ct)  =  ln(C0)  -  At.  (3) 

Equation  (3)  can  be  regarded  as  a  motivation  for  a  linear  regression  model  with  repeated 
measurements  incorporating  subject  effects. 

Let  y{  denote  the  column  vector  of  k  observations  on  the  ith  subject  taken  at  times 
(til,  t»2 ,  •  •  • ,  Lfc).  The  regression  model,  accounting  for  the  subject  effects,  is  given  by 

Vij  =  ft  +  Mi  +Ti  +  tij,  (4) 

for  i  =  1, . . . ,  n  and  j  =  1, . . . ,  k,  where  £"=i  n  =  0.  Here  Vij  denotes  the  natural  logarithm 
of  the  jth  measurement  on  the  ith  subject,  -A  is  denoted  by  ft,  and  e{j  denotes  normal 
error  with  mean  0.  Our  goal  is  to  obtain  AA/TS  estimates  of  ft,  ft  and  the  t*  s. 

The  inclusion  of  subjects  in  the  study  who  have  yij  >  log(c)  causes  left  truncation 
and  WLS  estimates  of  the  parameters  are  not  necessarily  unbiased.  WLS  estimates  of 
the  parameters  and  the  associated  bias  can  be  obtained  in  the  vector  representation.  But, 
because  we  want  to  explore  the  possibilities  of  correcting  for  bias  in  one  or  more  parameters, 
it  is  necessary  to  obtain  their  explicit  forms.  In  the  following  discussion  these  estimates 
are  obtained  and  it  is  shown  that  the  WLS  estimate  of  ft  can  be  made  unbiased  by 
appropriately  selecting  the  subjects  under  study. 

It  is  convenient  to  analyze  the  problem  in  terms  of  vectors  of  observations  for  sub¬ 
jects.  To  that  end,  we  use  the  following  notations. 
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Notations:  For  i  =  1, 2, . . . ,  n, 

•  yi  denotes  the  fc-dimensional  vector  of  observations  for  the  ith  subject,  =  {yn ,  Va,  •  •  •  i  Vik)> 

•  e  denotes  the  nfc-dimensional  vector  of  errors, 

•  ti  denotes  the  ^-dimensional  vector  of  times  for  the  ith  subject,  t\  =  (t*i,  U2, 

•  $  denotes  the  covariance  matrix  of  y^ 

•  Y  denotes  the  column  vector  of  all  nk  y-observations,  Y  =  [j/i ,  V2i  -  •  •  ?2/n]> 

•  ( 3  denotes  the  column  vector  of  all  (n  +  1)  parameters,  (3l  =  [fo with  r*  = 

fal>  ^2?  •  •  •  1  Tn— 1]> 

•  a  =  lt$'_1l,  where  1  denotes  the  fc-dimensional  column  vector  with  all  elements 
equal  to  1, 

•  t*  =  1 

•  yl  =  1 

•  t  =  TI  1  X/i=l  tii 

•  t*  —  n  1  ) 

•  CM  = 

Thus,  the  model  described  in  equation  (4)  can  be  rewritten  as 

Y  =  Xf3  +  e,  (5) 

where  the  design  matrix  X  is 


1 

ti  : 

1 

0  ••• 

1 - 

O 

Xi 

1 

t2  : 

0 

1  ••• 

0 

= 

x2 

1 

|_L 

tn  : 

-1 

-1  ••• 

-1 

.  X„ . 

The  WLS  estimate  of  /3  is  the  solution  of  the  system  of  equations 

(xV^X)  0  =  XlV-lY,  (6) 
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where 


$0  -  0 

o  $  ...  0 

V-  .  .  .  .  • 

0  0  ••  •  $ 

Since  XV-1^  =  £?=i  X^-'Xi,  and  XW^Y  =  Ef=i  Xl^Vi, follows  from  (6) that 
the  WLS  estimate  of  (3  satisfies 

*=1  V  i= 1 


txfr-'y, 

\i=l  /  i=l 

Straightforward  multiplication  gives 


A 

Bl 


B 

D 


and 


i=l 


E?=i  vl 

zr=i 

y*-y*n 

Vn— 1  Vn 


(7) 


na 

nt* 

:  0 

0 

0 

nt* 

n(S2  +  CM) 

: 

i  : 

*  tH 

•+o 

••• 

(£-1  - 

0 

(t;  -  *;) 

:  2a 

a 

a 

i=l 

0 

(«;  - «;) 

:  a 

2a 

a 

0 

(*;-!  -  «) 

:  a 

a 

2a 
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The  inverse  of  E^i  can  be  obtained  by  using  its  block  representation  and  ex¬ 

pressions  for  the  WLS  estimate  are  found  by  simplifying  the  set  of  linear  equations.  Let 

-1 


P  Q 
Q  R 


(£*•''4  ■ 

Then  (see  problem  2.8  in  Rao  (1973)), 

P  =  A'l  +  FE~1Ft,  Q  =  -FE~\  R  =  E~\ 

where 

E  =  (D  —  BtA~1B)1  and  F  =  A~lB. 

Then, 


V 


(S2  +  CM)  -t* 

-t*  a 

where  V  =  n2[a(S2  +  CM)  -  (f*)2]  is  the  determinant  of  A.  After  substituting  for  B,D, 
and  A'1,  applying  problems  2.7  and  2.8  in  Rao  (1973),  and  some  simplification  we  get 


R  =  E~x  =  - 


I  -  -11* 


a  L  n 
P  =  A^1  + 


+ 


n 


-rr* 


nzs2 


Q  = 


Va  (V  —  n2s 2) 
V 


a(V  —  n2s 2) 

(i*)2  —t*a 
—t*a 


or 


i-i 


0* 

T- 


a  [D  —  n2s2] 

where  s2  =  £  —  t* )2  an<^  ^  =  ((^i  —  ^*)>  (^2  —  ^  )>  •  •  *  >  (^n-i  —  ^  ))  • 

Expressions  for  the  elements  of  p  are  obtained  by  simplifying  equation  (7).  After 
substituting  for  the  inverse  of  E”=i  X*$_1Xi,  we  obtain 


a  =  l[r-?A] 

a  1  J 

pi  = 


n 


[V  -  n2s2) 
n 

[D  —  n2s 2 


.  2=1  «=1 


£«h/< 

.2=1 


(8) 


(9) 


and 


y{  -  y* 

t-»  * 

1 

't1 

r  = 

y*2~y* 

-Pi 

n-t * 

_  Vn—l  -  y* . 

Jn-  l~i\ 

(10) 
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where 


a\  =  [ati*  -  $_1  =  (oi[l],Oi[2],. .  .,«*[*]) . 

Since  a*l  =  =  0,  it  follows  that  that  EjU  <k]j\  =  0  or 

Ej=iai[i]  =  -°iW-  Hence, 


n 

[D  —  n2s2] 


(0i^  1  ”  (®w  “  yifc)  • 


I  i=l  J=1 


Because  the  y^s  appear  in  fa  only  as  differences,  it  will  be  shown  that  fa  can  be  made 
unbiased  as  long  as  we  can  adjust  the  subjects  in  the  sample.  This  observation  is  elaborated 
in  Section  4.  First,  we  mention  some  special  cases  of  interest. 


3  Special  Cases 

In  this  section  we  consider  two  special  cases  of  the  correlation  matrix  that  are  the  most 
likely  candidates  for  the  repeated  measures  model.  In  the  first  case,  we  assume  that  the 
correlation  matrix  satisfies  the  AR(1)  assumption  and  in  the  second  case  we  relax  this 
assumption  and  assume  that  it  is  given  by  the  Toeplitz  matrix.  In  each  case  we  consider 
the  cases  k  =  3  and  4.  We  also  consider  the  case  when  all  subjects  are  exposed  to  the 
contaminant  at  the  same  time,  the  rest  of  the  observations  are  not  necessarily  at  fixed 
intervals,  and  the  correlation  matrix  has  AR(1)  structure. 

3.1  AR(1)  Correlation  Matrix,  k  =  3 

Suppose  k  =  3,  tn,  *<a,  and  ti3  are  equally  spaced,  with  ti2  =  U i  +  A,  ti3  =  ti2  +  A  for  all 
values  of  i  for  some  fixed  A,  and  $  satisfies  AR(1)  conditions, 

1  P  P2' 

$  =  pip 

.  P2  P  1  . 

For  convenience,  let  U  =  tn  for  i  =  1, . . . ,  n.  It  can  be  seen  that  in  this  case 

nS 2  =  aE(ti-t)2, 
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CM 


at2  +  2at  A  +  5  „4,P"— A2, 


1  +  p2 


ns  = 


r 


a 


a2E(^-^ 

l 

a(t  +  A), 

3 -p 
1+p’ 


and 


a*  = 


aA 


(1  -  P2) 


[-1,  0,  1], 


where  i=J'EJLi  V  Consequently, 

2  2  „2  a  2 

V  -  ns*  =  — - — ^  A  , 


and 


A  = 


(i  -  p)2 

w(l  -  p)2(1  +  p) 
2(3  -  p)n2A2 


(3  -  p)A  A  ,  x 
(i  -  p)*(i + rt  £  (m‘  Vl‘\ 


2nA  ^  '*3'  ^ 

/?o  =  — -  E  (Sfei  +  C1  ""  p)yi 2  +  ^3]  -  (i  +  A)/?i , 


fi  =  {{yi  1  +  (l  -  p)y*2  +  yts)  -  (z/ii  +  (1 _  p)y^  +  ^3)}  -  Aa{U  *)• 


Michalek  et  al.  (1996a)  considered  this  special  case  in  greater  detail. 


3.2  AR(1)  Correlation  Matrix,  k  =  4 


We  consider  the  case  that  k  =  4  and  the  Vs  are  equally  spaced,  with  ti2  -  V  -  U 3  -  ti2  - 
£i4  —  ti3  =  A.  For  this  reason  we  denote  V  =  U  and  t  =  n  1  X)E=i  V  We  assume  that  the 
covariance  structure  is  given  by  the  AR(1)  model 

1  P  P2  P3' 

*-  \  1  y2  . 

p  p  1  p 

.  p3  p2  p  1 . 
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It  can  be  seen  that 


nS 2  =  OiY^iU  —  t)  ? 

i= 1 

2(2 -p) 

1  +  P 

p  6(2  —  p)-  5(1  -p»)-6p  +  9  3 

CM  =  at  -\ — — - 1 H - ^  5 

1+P 


a  — 


1-P2 


and 


ns2  =  a2  ^(i*  - 1)2, 

„  tIa3*2"'’) 

t  =  at  +  A  1+^  , 


aj  =  (i  +^)o-7) A H3 ~ rt'  -(1_A  (1_A  (3_p)l 


Thus, 


(10-5p  +  p2)(nA)^ 


y;  [(3  —  p)(j/i4  —  Vi  1)  +  (1  —  P2)(j/i3  2/^2)]  • 


Finally,  &  and  f  are  obtained  from  the  general  forms  in  equations  (8),  (9)  and  (10)  with 
t*  defined  above  and 

y *  =  — — [(j/il  +  2/14)  +  (1  —  P){Vi2  +  Vm)] 

1  +  p 

y*  =  ~  +~  [(?i  +  Vd  +  (1  “  P)( 2/2  +  2/3)]  • 


3.3  Toeplitz  Correlation  Matrix,  k  —  3 


As  in  section  3.1  and  3.2,  the  Vs  are  equally  spaced  but  now  we  assume  that  the  correlation 
matrix  is  Toeplitz  of  order  3, 


1  Pi  P2 


$  = 


Pi  1 


P2  Pi 


Pi  ' 
1 


Straightforward  simplifications  give, 

a  = 


3  —  4  pi  +  £2 

1  —  2pi  +  p2  ’ 
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n 


[' V  —  n2s 2] 

Oi[2] 


(1  —  2pi  +  P2)(l  ~  P2) 

2n{3  -  Apx  +  p2)A2  ’ 
roi  _  A(3  -  Api  +  P2) 

a^-(1_2p1  +  p2)(l-p2)’ 

0. 


Consequently,  as  in  the  case  of  AR(1)  model, 


^  2nA  ^ 


This  estimate  is  equal  to  the  estimate  obtained  in  section  3.1.  Estimates  of  po  and  t  are 
obtained  from  the  general  formulas  in  (8)  and  (10). 


3.4  Toeplitz  Correlation  Matrix,  A;  =  4 


As  in  the  previous  case,  the  tj’s  are  equally  spaced  but  the  correlation  matrix  has  a  Toeplitz 
structure, 


It  can  be  seen  that, 


a 


— ai[l] 


-a*[  2] 

n 

[D  -  n2s 2] 


1 

Pi 

P2 

Pz 

Pi 

1 

Pi 

P2 

P2 

Pi 

1 

Pi 

.  Pz 

P2 

Pi 

1  _ 

2- 

Pi” 

2p2 

+  P3 

1  +  pi  —  (Pi  —  P2)2  +  Pi(P2  +  Pz )  ’ 

r,i  (4pi  -  P2  -3)(-2  +  2p2  +  pi  -  Pz)  A 
ai[4]  -  q  A’ 

roi  _  (1  ~  3Pi  +  3p2  -  p3)(2  ~  Pi  ~  2^2  +  P3)  A 
a-i|3]  —  “ q 

_ Q _ 

nA2(10  —  15pi  +  6  p2  —  Pz){2  —  Pi  —  %P2  +  Pz)  ’ 


where  Q  denotes  the  determinant  of  $.  Finally, 


Pi  =  — - — — - - - [(3  -  Api  +  P2)(y»4  -  yn)  +  (1  -  3Pi  +  3P2  ~  Pz)(yiz  -  yap]  • 

10  -  15pi  +  6 p2  -  Pz 

Other  estimates  are  obtained  from  the  general  formulas  (8)  and  (10),  and  the  above  estimate 
of  Pi. 
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3.5  AR(1)  Correlation  Matrix,  k  =  3,  tn  =  t\ 


In  this  case  all  tn  are  the  same,  tn  =  tu  whereas  values  of  ti2  and  tiZ  satisfy  h  <  ti2  <  ti3 
and  are  otherwise  arbitrary.  We  assume  that  the  correlation  matrix  has  AR(1)  structure. 

Then, 

nS2  =  £[(i  +  p2)(^  -  h?  -  -  h)(tiz  -  h)  +  (t»  -  i3)2] , 

l  -  p 2  L 

CM  =  - — - — 7  \t\  —  +  (1  +  p2)t 2  —  2/)?2^3  +  ^3]  1 

1  —  p*  1 

ns2  =  7-7—2  E  K1  “  P)  (to  -  *2)  +  (to  —  *3)] , 

1+P2fci 

F  =  - - - (tl  +  ?3  +  (1  —  P)^)- 

1  +P 

From  case  1,  we  know  that  a  =  (3  -  p)/(l  -  p).  The  vector  a*  is 


1 

ai"(l  +  p)(l-p2) 


2ti  —  (1  +  p)t$2  —  (1  —  p)t is 
(1  +  p)(—ti  +  2tj2  —  to) 

—  (1  —  p)t\  —  (1  +  p)ti2  +  2tj3  _ 


Using  the  fact  that  aj[l]  +  a* [2]  +  a*  [3]  —  0  and 

alS*  +  CM]-<??-s*  =  (1  +  ^  -  ,2)  [i  (t1  +  »)  t  ' &  ~  (1  +  »)  t  + 1 4) 

—  ((1  +  p)ht2  +  (1  —  p)t\i3  —  ?i)] , 


we  obtain 

£  =  2n  [A(tl-  B(t)i  §  -  (X  +  ^  -  (1  -  {y«  -  ya)  - 

[2t\  —  (1  +  p)ti2  —  (1  p)to]  ( Ui2  —  2/il)}  » 


where 


and 


A(t)  =  n  1  ((1  +  p)  ]Ft22  -  (1  +  p)£toto  + 

t  i=l  i=l  <=1  ' 


B(t )  =  |(1  +  p)il^2  +  (1  ^l}  • 

Estimates  of  /?o  and  the  subject  effects  are  obtained  from  the  general  formulas  (8)  and  (10). 
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4  Bias  and  bias  correction  of  Pi 


It  is  well  known  that  if  the  model  (in  its  matrix  formulation)  is  given  by  (5),  then  3  is  an 
unbiased  estimate  of  ft  More  precisely,  if  E {y^)  =  ft  +  PiUj  +  n,  then  E3  =  ft  But,  in 
the  current  setting,  only  those  yfts  are  included  in  the  sample  that  satisfy  yy  >  log(c). 
Due  to  this  truncation  constraint, 


E  {yij  I  Vij  >  l°g  C)  =  ft  +  ft**j  +  Ti  +  E  {{yij  “  ft  ”  PlUj  Ti)  I  Vij  >  1°S  C] 

=  ft  +  fiitij  +  Ti  +  E  [Zij  |  Zij  >  Zij] , 

where  is  a  normal  random  variable  with  mean  0  and  z{j  =  log(c)  -ft-ft*i.j — r4.  Hence, 

E  [yij  |  Vij  >  logic)]  =  ft  +  ft  ft-  +  only  if  Zij  =  -oo.  Tallis  (1961),  and  more  recently 

McGill  (1992),  have  evaluated  conditional  expectations  and  higher  conditional  moments  of 
the  vector  random  variable  {Zn .  Zq,  . . . ,  Zik)  with  correlated  components.  Thus, 


E 


=  ft  + 


n 


[D  -  nV]  £ 


where  =  E  [{Zn,  Zi2, . . . ,  Zikf  |  Za  >  *a,  >  Zq,  •  •  • ,  Zik  >  zik]  and  ft  is  a  biased  es¬ 

timate  of  ft  with  bias  given  by  n/[X>  -  n2s2]£”=1a‘<ft-  If  we  are  allowed  to  manipulate 
the  samples,  then  the  bias  in  ft  can  be  made  equal  to  zero.  The  basic  idea  is  easy  to  follow 
in  a  special  case.  Consider  the  case  k  =  3,  equally  spaced  times,  and  AR(1)  correlation 
structure.  In  this  case 


1 


2nA  i 


£(y«s  -  Vi i) 


i=i 


and  the  bias  in  ft  reduces  to 

bias(ft)  =  2n^  ?  (^{^i3  I  %i3  •>  Zi 3)  —  ®(^*i  I  ^  • 

This  bias  can  be  eliminated  provided  we  can  arrange 


E(2f3  |  Ziz  5*  2i3)  —  E(2ii  j  >  Zn), 

or,  equivalently,  if  we  can  arrange  Z{3  =  zn  for  all  values  of  i.  If  the  subjects  are  included 

in  the  sample  when  yij  >  log(c),  then  2*3  =  log(c)  —  ft  —  ft ~  Ti  an<^  zn  —  log(c)  —  ft 
ft£.  _  2 Aft  -  Ti  and  zi3  and  za  are  not  equal.  On  the  other  hand  if  we  are  allowed  the 
freedom  to  shift  the  truncation  points  of  the  yft sJ;o  log(c)  —  2 Aft,  for  i  =  1, . . .  ,n,  then 
the  new  zi3  and  za  will  be  equal  and  the  bias  in  ft  will  disappear. 
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This  approach  has  two  drawbacks.  First,  the  (new)  truncation  point  of  ya  depends 
on  the  unknown  parameter  A-  However,  this  problem  can  be  resolved  by  repeatedly  substi¬ 
tuting  updated  estimates  of  A  until  no  further  change  occurs  in  the  value  of  the  estimate 
of  A  Second,  in  this  approach  subjects  are  removed  from  the  sample  because  the  new 
points  of  truncation  are  increased.  In  the  AFHS,  Michalek  et  al.  (1996b)  have  found  that 
the  reduction  in  the  sample  size  is  small  in  comparison  with  the  overall  size  of  the  sample 
and  the  procedure  reduces  the  mean-squared  error. 

The  bias  in  A  can  be  removed  in  the  general  case  in  a  similar  manner.  Letting 
a\  1  =  0  and  A  =  n/[V-  nV]  E”=i  a\yu  A  can  be  written  as 

ft  =  m  |a,w) "  lSii  “  S“1 ' 

\u-n  s  i  i=1  j=2 

By  arranging  the  truncation  points  of  the  y^s  such  that  for  all  values  of  i,  ztj  =  zn  for 
j  =  2, . . . ,  k,  the  estimate  of  A  will  be  unbiased.  But  zxj  =  zn  can  be  achieved  by  shifting 
the  truncation  points  to  log(c)  +  MUj  ~  for  j  =  2, . . . ,  k,  as  in  Michalek  et  al.  (1996b). 


5  Conclusions 


In  this  report  we  have  extended  the  results  of  Michalek  et  al.  (1996b)  to  arbitrary  values 
of  k,  the  number  of  the  repeated  observations  on  each  subject.  In  addition,  we  have  given 
expressions  for  the  estimates  of  the  other  parameters.  It  remains  to  be  seen  if,  m  the 
general  case,  the  estimate  of  A  will  continue  to  behave  as  observed  by  Michalek  et  al. 
(1996a).  That  is,  if  the  estimate  of  A  is. made  unbiased,  we  need  to  determine  whether  its 
mean-squared  error  be  small  also,  irrespective  of  the  correlation  structure. 


16 


REPORT  II 


The  Half-Life  of  Dioxin  in  Humans  Adjusted  for  a 

Categorical  Covariate2 


2'phis  research  was  carried  out  in  collaboration  with  Dr.  Kishan  G.  Mehrotra,  Syracuse  University, 
under  the  University  Resident  Research  Program  of  the  Air  Force  Office  of  Scientific  Research. 


17 


Abstract 


Pharmacokinetic  studies  of  environmental  contaminants  are  based  on  only  a  few  measure¬ 
ments  per  subject  taken  after  the  initial  exposure.  The  subjects  are  included  in  the  study 
only  if  their  body  burden  is  above  a  threshold  (c).  It  is  assumed  that  a  first  order  decay 
model  with  one  compartment  holds,  which  leads  to  a  repeated  measures  linear  model,  relat¬ 
ing  the  logarithm  of  the  concentration  of  the  contaminant  with  time  and  other  covariates. 
The  negative  of  the  coefficient  of  time  represents  the  decay  rate  (A),  with  the  half-life  given 
by  t1/2  =  ln(2)/A.  The  usual  WLS  estimate  of  the  decay  rate  is  biased  due  to  regression 
to  the  mean.  It  has  recently  been  shown  (see  Michalek  et  al.  (1996b))  that  this  estimate 
can  be  made  unbiased  if  the  data  are  properly  conditioned.  Since  body  fat  has  been  found 
to  be  an  important  covariate  for  predicting  the  decay  rate  of  dioxin,  an  unbiased  estimate 
of  the  decay  rate  is  proposed  that  is  adjusted  for  an  indicator  of  body  fat  category  and  its 
interaction  with  time. 
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1  Introduction 


Pharmacokinetic  studies  of  environmental  contaminants  in  humans  are  generally  restricted 
to  only  a  few  measurements  taken  after  the  initial  exposure.  The  initial  dose  is  usually 
unknown,  because  the  exposure  occurred  before  the  environment  was  known  to  be  con¬ 
taminated.  We  assume  that  a  one  compartment  first-order  decay  model  with  decay  rate  A 
holds  for  subjects  with  body  burden  above  a  background  level  determined  by  a  threshold 
c.  Subjects  are  included  in  the  study  only  if  their  body  burden  is  greater  than  c.  The 
threshold  is  defined  to  be  a  high  quantile,  such  as  99th  percentile,  of  the  distribution  of 
the  concentrations  of  the  contaminant  in  a  control  population.  We  assume  that  the  con¬ 
centrations  are  log-normally  distributed  which,  together  with  the  first-order  decay  model, 
implies  a  repeated  measures  linear  model  relating  the  logarithm  of  the  biomarker  and  time, 
with  slope  -  A.  It  is  known  that  the  WLS  estimate  of  A  is  biased  due  to  the  regression  to 
the  mean  because  of  the  way  the  subjects  are  included  in  the  study.  It  has  been  shown 
(see  Michalek  et  al.  (1996b))  that  if  the  data  are  properly  conditioned,  the  WLS  estimate 
of  A  can  be  made  unbiased.  This  process  is  appealing,  because  unbiased  estimates  can  be 
obtained  with  commercially  available  software,  such  as  SAS.  However,  this  estimate  is  not 
adjusted  for  any  covariates.  For  example,  because  dioxin  is  lipophilic,  body  fat  is  known  to 
be  a  predictor  of  the  concentration  of  dioxin.  Here  we  develop  a  WLS  estimate  of  the  decay 
rate,  which  is  adjusted  for  binary  covariate.  This  estimate  is  made  unbiased  and  is  used  to 
produce  an  estimate  of  the  decay  rate  adjusted  for  the  binary  covariate.  These  results  are 
applied  to  a  pharmacokinetic  study  of  dioxin  in  veterans  of  Operation  Ranch  Hand. 

In  Section  2  we  present  the  repeated  measures  linear  model  and  derive  closed  form 
expressions  for  the  WLS  estimates  of  the  parameters.  We  also  obtain  an  expression  for  the 
bias  of  the  coefficient  of  time  under  the  condition  that  the  body  burden  of  dioxin  in  the 
subjects  included  in  the  study  is  greater  than  a  threshold  c.  The  estimate  is  then  made 
unbiased.  In  Section  3,  we  discuss  the  estimates  in  terms  of  their  mean-squared  errors.  In 
Section  4,  we  apply  the  results  to  AFHS  data. 
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2  Model  and  Analysis 


Let  y{  denote  the  column  vector  of  k  observations  on  the  ith  subject.  We  assume  that 
these  observations  are  taken  at  k  equally  spaced  times  ( UtU  +  A, . . . , U  +  (k  -  1)A).  Due 
to  the  binary  nature  of  the  covariate,  the  subjects  are  partitioned  into  two  groups:  g  =  1 
and  g  =  —  1.  Without  loss  of  generality  we  assume  that  the  first  nx  observations  belong  to 
group  g  —  1  and  the  remaining  n2  belong  to  g  =  —  1.  Then,  a  model  for  the  nested  design, 
with  subjects  nested  in  groups,  is  given  by 

Uiji  =  A)  +  Pl^ij  +  filQt  T  039 1  X  Uj  +  Ti(t)  T 

£or  ^  j  =  1  £  =  1,2,  where  denotes  the  effect  of  the  ith  subject 

in  the  £th  group.  We  assume  that  YZi H*)  =  0  for  1  =  K  wiU  be  convenient  to 

analyze  the  problem  in  terms  of  vectors  of  observations  for  subjects.  To  that  end,  we  use 

the  following  notations. 

Notations:  For  i  =  1, 2, . . . ,  n,  and  ^  =  1,2, 

•  x/i  denotes  the  /c-dimensional  vector  of  observations  for  the  ith.  subject, 

•  ti  denotes  the  fc- dimensional  vector  of  times  for  the  ith  subject  adjusted  for  the  mean 
time.  In  other  words, 


U+  A 


t  +  ^  A  > 

t  4-  ^iA 

.2  =  (ti  —  t)l  +  Ae, 


where 


U  +  {k-  1)A 


*  +  ¥A 


-(fc-1) 


1  ,  1  -(fc-3) 

and  e  =  - 


(*-i) 


are  A:— dimensional  vectors,  and  t  —  n  Xyj=i  £»• 

•  We  assume  that  the  covariance  matrix  of  yu  denoted  by  $,  satisfies  AR(1)  structure, 


1  p  p2  ...  pk  1 

p  1  P  •••  P 


pk-l  pk- 2  pfc-3  _  1 
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•  The  vector  of  all  nk  y-observations  is  denoted  by  Y , 

Y*  —  [l/i,  2/25  •  •  •  Vrn  5  3/ni+H  •  *  •  ’  • 

•  The  vector  of  all  n  +  2  parameters  is  denoted  by  /34  =  [$o,  Pi,  fa,  fa,  t\,  t|], 

•  a  =  l**-1!., 

•  8  -  et$-1e, 

•  y*  = 

•  ti  =  nr1  E£i  fc,  and  t2  =  nj1  EIU1+i  with  n2  =  n  -  m, 

•  X  denotes  the  design  matrix,  and  Xj  denotes  the  design  matrix  associated  with  the 
ith  subject, 

•  V  denotes  the  block- diagonal  matrix  V  =  diag[$, 

Thus,  the  model  can  be  written  as 

Y  =  XP  +  €,  (2) 

where  X  is  given  by 


The  WLS  estimate  f3  of  /3,  derived  from  (2),  satisfies 

(XV^X)  3  =  XlV~lY .  (3) 

Since  XT'X  =  E”=i  Xl$~lXi,  and  X'V-'Y  =  E”=i  it  follows  from  (3)  that 

3  is  given  by 

3  =  Ex;*-v  W 

\i= l  /  *=i 
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Next,  we  write 


A 

i  B i 

:  b2 

'Wo' 

i= 1 

i  Ei 

:  0 

and  = 

1=1 

Wx 

. B 2 

:  0 

:  £2 

.  W2  . 

Here, 


A  = 


A\  A2 
A2  A\ 


for 


ELi  nt  0 

0  Eil  ne[s]  +  (te-T)2  +  6A2} 


and 


Is?  +  (*<  -  *)2  + |5A2] 


} 


1 

O 

1 - 

O 

l _ 

Bi  =  a 

■n  -  dm  -  f;)1., 

0* 

,  B2  =  a 

3?  -  k  -  mi, 

0* 

.  7?  -  K,  -  $)1* ,  . 

1 

1 

3 

1 

nT»! 

h-1 

3  «• 

to 

Wo  = 


W{  = 
W2‘  = 


ESi2/r+an1+i^ 

-  h)y; + Etn1+1(‘<  -  h)yt  +  ae?„,  e‘*"Vi 
.  Efiift  -  h)yt  -  ET„1+i(*i  -  h)y!  +  a  [e£i  e'ir'y,  -  E£„l+i  . . 

[(2/l  —  Vni)i  (2/2  —  Vn  1))  —  >  (Uni-l  ~  2/ni)]  ) 

[(2/ni+l  —  Vn)i  (Vni+2  ~  2/n)>  •  •  ‘  ’  (Pn—1  ~~  2/n)]  » 


where  Tf  =  [(tj  —  ^1)5  (^2  — ^1)5  •  •  •  >  (^ni-l  — ^l)]  an<^  ^2  —  [(^ni+1  ^2 )>  (^ni+2  ^2))  •  •  •  >  (^n-1 

t*)].  We  also  define  Ee  =  a  (I*  +  1*1$),  where  I*  is  an  identity  matrix  of  size  (n*-l)  x  {ne-l) 

and  1  £  is  a  column  vector  of  size  (ri£  —  1)  of  all  l’s. 
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We  find  an  explicit  expression  for  (]Ci!=i  X*®  1-X’i)  .  Let 


A~l  +  ol2Q 

;  D\ 

i  D2 

•  •  ♦  •  •  ♦ 

D\ 

i  Fi 

:  0 

D\ 

:  0 

:  F2 

Then,  using  problems  2.7,  2.8,  and  2.9  in  Rao  (1973)  repeatedly,  we  note  that  A  1  and  Q 
satisfy  the  following  ‘block-equality’  property, 


a 11 

a12 

:  a13 

a14 

a12 

o22 

;  a23 

a24 

’  Cl 

02 

. 

,  C_1  = 

... 

.  ... 

a13 

a14 

;  a11 

a12 

0,2 

i  Ci  . 

o23 

a24 

i  a12 

a22 

where  the  au  are  elements  of  A  4,  Qi  =  n\s\Qi  +  n2s\Q2i  an<f  O2  —  nisiQi  n2s2Q2-  In 
turn,  for  ^  =  1,2, 

_ 1 _  (t  —  te)2  ( i-te ) 

4njaA2S(asj  +  A2<5)  {t  —  ti)  1 


In  a  similar  manner, 


D\  =  —a 


Du 

Dn 


Ti  and  D2  =  —oc 


D\2 
—  D\2 


l2i 


where,  for  ^  =  1,2, 


Du  = 


1 


2aA2Sri£ 


(t  -  te) 

1 


For  i  =  1,2,  let  Ft  =  \  [i*  -  £l/l|]  + 

Substitutions  for  the  values  of  A  'L .  Q  and  the  other  terms  in  equation  (4)  and 
simplification  gives 
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A'* 

‘VE&y?  +  nJ1S*+itf  ‘ 

A 

1 

0 

>*v 

A 

2a 

n^EZiVi-^E^m+iyi 

V  &  J 

0 

2A  6 


nT1  ft  -  t)E& i  e^-1^  +  n^(*2  - 1)  E?=„1+i 

nr1  E£i  +  n2 1  E?=ni+i  e**"1!* 

nr1!?!  - *)  E£i  e^'^i  -  n2-x(t2  -  t)  E?=n1+i  <£*rxVi 

L  ^r1  E£i  e^"1^  -  n2 1  Ei=m+i  e**"1^ 


whereas  estimates  of  the  subject  effects  satisfy 

—  i i)  1  (t-ti)  1]%. 
n?[{i-i2)  1  (t-t2)  1]T2 

It  can  be  seen  that 


Ti 

1 

r2  _ 

2A  26 

"1 

Wm  +  Wq2 

+ 

’  JS\Wi  " 

Woi  -  W02  _ 

F2W2  _ 

Ti  = 


a 


y{  -  Pi 

h  —ti 

P2  ~  Pl 

t2  —  h 

• 

niAS 

.  2/ni-l  -  y?  . 

tni-i  ^i 

and 


r2  = 


Pni+l  P2 

tni  +1  ^2 

1 

*e§' 

1 

+ 

E?=ni+iet$  ay* 

ini +2  —  h 

a 

: 

n2A6 

‘ 

_Pn-l-P2 

in- 1  ”  ^2 

Special  Case 

We  specialize  the  above  formula  for  k  =  3.  In  this  case  a  =  (3  -  p)/(l  +  p),  $  =  2/(1  - 
P2),y*  =  (2/ii  +  Viz  +  (1  -  P)ya)/(1  -  P)  and  =  (ite  “  2/ii)/(1  “  P2)’  Consequently, 


*4 


S(2/i3  -  Pn)  +  nTT  £  fa®  Vil ) 
2^i A  2n2A  i=ni+1 


and 


ni 


2niA^ 


]C(*te  -  y«) 


2n2A  i=£f+i 


Y  (ya-Jfti) 


These  two  estimates  are  related  to  the  estimates  obtained  by  Michalek  et  al.  (1996b). 
Their  estimate  of  the  regression  parameter  (based  on  one  sample,  without  a  categorical 
covariate)  is  (2nA)“1  E (ye  -  Pn)-  0ur  estimate  can  be  described  as 
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First,  find  the  estimate  of  the  regression  parameter  from  the  first  sample  (with 
group  value  g  =  1).  Second,  find  the  estimate  of  the  regression  parameter  from 
the  second  sample  (with  group  value  g  =  —1).  Then,  a  simple  average  of  these 
two  estimates  gives  an  estimate  of  fa  and  a  simple  average  of  the  difference  of 
these  two  estimates  gives  an  estimate  of  fa. 

The  estimates  of  fa,  fa-  T (i)  and  T( 2)  also  simplify  in  this  case. 


3  Unbiasedness  and  Mean-Squared  Errors 


The  WLS  estimates  of  fa  and  fa  are  biased  because  the  yt' s  are  left  truncated  (only  those 
subjects  are  included  in  the  study  for  whom  the  values  of  the  y^  s  are  greater  than  log(c)). 
However,  the  bias  in  fa  and  fa  can  be  corrected  by  readjusting  the  truncation  points  of  the 
y.’s,  as  explained  in  Michalek  et  al.  (1996b).  Note  that  the  exercise  of  fixing  the  truncation 
points  of  the  y^s  will  be  useful  only  if  the  mean-squared  error  of  the  estimate  of  fa  does 
not  increase.  Michalek  et  al.  (1996b)  have  observed  that  their  procedure  for  correcting  the 
bias  actually  decreases  the  mean-squared  error. 


4  Results  for  the  Air  Force  Health  Study 


For  the  purposes  of  measuring  the  change  in  the  decay  rate  due  to  PBF  category,  we 
divide  the  data  in  the  AFHS  (see  Michalek  et  al.  (1996b)  and  Wolfe  et  al.  (1990))  into 
two  groups.  Group  1  consists  of  subjects  with  PBF  less  than  the  median  and  Group  2 
consists  of  subjects  with  PBF  greater  than  the  median.  The  half-fife  estimates  derived 
from  unbiased  estimates  of  the  decay  rates  are 

Half-life  of  dioxin  in  subjects  with  PBF  less  than  the  median  =  7.19  years 
Half-life  of  dioxin  in  subjects  with  PBF  greater  than  the  median  =  9.72  years 
In  contrast,  the  half-life  estimates  derived  from  biased  estimates  of  the  decay  rate  are 
Half-life  of  dioxin  in  subjects  with  PBF  less  than  the  median  =  7.33  years 
Half-life  of  dioxin  in  subjects  with  PBF  greater  than  the  median  =  10.23  years 

The  change  in  the  decay  rate  due  to  PBF  category  is  significant  .  The  biased  estimates  are 
larger  than  the  unbiased  estimates.  The  change  in  half-lives  due  to  PBF  category  is  smaller 
for  the  unbiased  estimates  than  the  biased  estimates  by  approximately  a  half  year. 
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Abstract 


In  longitudinal  studies,  subjects  are  sometimes  included  if  their  measurements  at  each  point 
of  time  are  above  a  threshold.  However,  estimates  of  parameters  of  a  regression  model  are 
often  desired  in  this  setting  in  the  presence  of  covariates.  WLS  estimates  of  the  regression 
parameters  in  the  presence  of  truncation  are  biased.  In  this  paper,  we  develop  maximum 
likelihood  estimates  of  the  regression  parameters  when  the  repeated  measurements  have  a 
multivariate  normal  distribution  and  the  data  are  restricted  to  lie  above  a  threshold.  In 
addition,  the  model  involves  subject  effects.  The  estimates  are  obtainable  by  solving  a 
system  of  nonlinear  equations. 
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1  Introduction 


In  longitudinal  studies,  complications  may  arise  by  a  natural  truncation  introduced  through 
the  subject  selection  process.  For  example,  Davis  (1976)  reported  a  study  in  which  men 
aged  35  to  39  years  were  screened  for  cholesterol  levels.  Only  those  men  whose  cholesterol 
level  exceeded  265  mg%  were  selected  to  be  in  the  trial  and  a  treatment  was  administered  to 
them.  Even  though  it  was  reasonable  to  give  treatment  to  only  those  with  a  “problem”  this 
selection  procedure  introduced  truncation  and  the  effect  of  the  treatment  was  confounded 
with  the  well  known  regression  to  the  mean  effect.  This  aspect  is  well  recognized  and  has 
been  addressed  for  Davis’s  data  (see  Senn  and  Brown  (1985))  and  other  similar  studies  (see 
James  (1973)).  If  a  study  is  well  designed  then  truncation  is  generally  not  a  problem;  for 
example,  consider  the  study  reported  by  Gardner  and  Heady  (1973).  A  blood  sample  was 
taken  at  a  screening  visit  and  the  cholesterol  content  was  analyzed.  Subjects  participating 
in  the  study  consisted  of  the  top  third  of  the  distribution  together  with  a  random  half  of 
the  bottom  third.  Subjects  in  the  top  third  were  randomly  assigned  to  the  treatment  or 
to  a  control  group  (receiving  placebo),  while  the  men  from  the  bottom  third  were  given 
placebo  only  and  formed  a  second  control  group.  Follow  up  studies  were  conducted  at 
6-month  intervals  for  2  years  and  annually  afterwards.  In  this  study,  the  truncation  was 
equally  applicable  to  both  the  treatment  and  the  placebo  groups  in  the  top  third  group. 
Therefore,  comparison  between  control  and  treatment  in  the  top  third  group  was  not  biased 
by  regression  to  the  mean,  whereas  a  comparison  between  the  treatment  group  and  the 
bottom  third  group  would  have  been  inappropriate  without  adjusting  for  truncation. 

The  Air  Force  is  conducting  a  20-year  prospective  study  of  veterans  of  Operation 
Ranch  Hand,  the  unit  responsible  for  aerial  spraying  of  Agent  Orange  and  other  herbicides 
in  Vietnam  from  1962  to  1971.  Physical  examinations  were  administered  in  1982,  1987  and 
1992.  Since  1987,  exposure  has  been  indexed  by  a  measurement  of  dioxin,  in  serum.  In  1987, 
all  willing  Ranch  Hand  veterans  and  comparison  veterans  (who  served  in  Southeast  Asia 
during  the  same  period  but  who  were  not  involved  with  spraying)  were  asked  to  contribute 
blood  for  a  dioxin  assay;  870  Ranch  Hands  were  assayed  and  received  quantifiable  results. 
As  part  of  a  pharmacokinetic  study  of  dioxin,  all  343  Ranch  Hand  veterans  with  dioxin  levels 
above  10  parts  per  trillion  (ppt)  in  1987  and  who  had  stored  serum  from  1982  examination 
were  selected.  Dioxin  measurements  for  these  veterans  were  also  made  in  1992.  In  the 
pharmacokinetic  study,  only  those  veterans  whose  levels  were  greater  than  10  ppt  at  all 
three  physical  examinations  were  included,  resulting  in  left  truncation.  The  initial  dose  of 
a  Ranch  Hand  veteran  was  unknown  because  the  exposure  occurred  before  the  herbicides 
were  known  to  be  contaminated.  The  goals  of  the  pharmacokinetic  study  were  to  find  an 
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estimate  of  the  decay  rate  of  dioxin  in  these  veterans  and  assess  the  significance  of  changes 
in  the  decay  rate  with  covariates,  such  as  PBF  and  age. 

The  AFHS  has  four  special  features:  (i)  the  observations  are  truncated  (ii)  each 
subject’s  observations  over  the  time  periods  are  correlated  (iii)  a  fixed  subject  effect  is 
necessary  and  therefore  introduces  large  number  of  parameters  and  (iv)  there  are  covariates 
which  may  influence  the  decay  rate. 

The  estimation  of  the  decay  rate  of  dioxin  has  been  the  subject  of  several  articles 
(see  Needham  et  al.  (1994),  Pirkle  et  al.  (1989),  Michalek  et  al.  (1996a),  Michalek  et  al. 
(1996b)).  In  all  of  these  studies  a  one-compartment  first  order  decay  model,  with  decay 
rate  A,  was  assumed  to  hold.  If  Cy  denotes  the  concentration  for  subject  i  Uj  years  after 
exposure  and  Ci0  is  the  (unknown)  initial  concentration,  then  the  first-order  kinetics  model 
is  given  by 

Cij  =  Ci0  e~xtii .  (1) 

Because  not  all  subjects  were  exposed  at  the  same  time,  we  have  considered  ft,  the  time  in 
years  from  the  initial  exposure,  to  be  subject-dependent.  If  we  take  the  natural  logarithm 
of  (1),  we  obtain 

In  {Cij)  =  ln(C’jo)  -  Aft.  (2) 

Equation  (2)  motivates  a  repeated  measures  linear  model  on  the  log  scale,  with  ft  =  -A, 

yij  =  /?o  +  T*  +  ft  ft  +  i  —  1, . . .  ,n,  j  =  1, . . .  ,k.  (3) 

A  WLS  procedure  is  an  obvious  choice  for  estimation,  however,  WLS  estimates  will  be 
biased  if  the  observations  are  truncated. 

The  magnitude  of  the  bias  can  be  studied  by  means  of  a  closed  form  expression  for 
the  WLS  estimate  of  ft.  Michalek  et  al.  (1996a)  have  described  an  ad  hoc  procedure  that 
iteratively  results  in  an  approximately  unbiased  estimator.  Michalek  et  al.  (1996b)  derived 
a  closed  form  estimate  and  used  it  to  estimate  the  bias.  But,  since  their  formula  for  bias 
also  contained  the  parameter  they  were  trying  to  estimate,  it  was  difficult  to  assess  the 
accuracy  of  the  estimate  of  the  bias.  Another  approach  is  to  consider  a  procedure  that 
accommodates  the  truncation  and  provides  estimators  that  are  relatively  easy  to  compute, 
are  at  least  asymptotically  unbiased,  and  are  asymptotically  normally  distributed.  In  this 
paper,  we  give  an  alternative  method  to  provide  closed  form  expressions  for  the  WLS 
estimates  of  the  parameters  of  a  regression  model  with  repeated  measures  and  fixed  effects 
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when  the  observations  are  truncated.  We  also  obtain  ML  estimates  of  the  parameters  based 
on  the  truncated  observations.  The  asymptotic  properties  of  the  estimators  follow  from 
standard  maximum  likelihood  theory.  We  consider  two  procedures  to  estimate  the  variances 
of  the  ML  estimates,  one  via  the  bootstrap  method  and  the  other  via  the  inversion  of  the 
information  matrix. 

In  section  2,  we  introduce  the  model  in  general  form  and  obtain  the  estimating 
equations  for  the  case  of  truncated  observations.  We  obtain  closed  form  expressions  for  the 
WLS  estimates  for  the  untruncated  case  and  provide  ML  estimates  for  the  case  of  truncated 
observations.  Section  3,  we  derive  the  variance-covariance  matrix  of  the  estimates  via 
inversion  of  the  information  matrix.  We  apply  the  results  to  AFHS  data  in  Section  4. 


2  Maximum  Likelihood  Estimation 


We  consider  estimation  of  the  parameters  of  the  regression  model  with  fixed  subject  effects 
and  with  repeated  measures  in  the  context  of  left  truncation.  A  similar  development  can 
be  addressed  for  right  or  middle  truncation.  Consider  model  (3)  in  vector  notations.  Let 
y.  be  the  random  variable  representing  the  k  observations  associated  with  the  ith  subject. 
Then,  model  (3)  can  be  written  as 

Ui  —  ®  1 ,  .  .  .  ,  72, 

where  we  assume  that  the  errors  follow  a  multivariate  normal  distribution  with  mean  0 
and  covariance  matrix  Q,  and  0i  is  given  by 

0i  =  /30l  +  /?iti  +  7il,  (4) 

where  t\  =  (tn,  tn, . . . ,  iifc)  and  1*  =  (1, ,  1)  are  ^-dimensional  column  vectors  and  the 
Tj’s  satisfy  the  restriction  Ti+T2  +  ...  +  Tn  =  0. 


The  ML  estimates  are  derived  under  the  assumption  that  2/i  >  °  for  a11  values 
of  i,  where  a*  =  (a,  a,..., a).  Regression  models  with  truncation  have  been  discussed 
extensively  in  the  literature;  see  Maddala  (1983).  But  to  the  best  of  our  knowledge,  a 
general  discussion  of  the  repeated  measures  model  with  fixed  effects  and  truncation  has 
not  received  much  attention.  Let  S{a,0i)  be  the  survival  function  for  the  ith  subject. 
Then,  the  conditional  likelihood  function  and  the  corresponding  log-likelihood  function  are 


given  by 


n 

i= 1 


(27t)~2  I  Q,  |  2  exp[“(2/{  -  0iY  1  (Vi-  Oi)]/S(a,0i) 


30 


and 


=  II  f(yi>0i)/S(a’0i)’ 

i= 1 


where 


In (L)  =  £>(£,)  =  £> (/(»„»<))  -  t,HS(a,ei)), 

i=l  i=l  *=1 

rOO 

S(a,0i)=  /  /(yttWVi- 

j  a 


(5) 


Equation  (5)  has  two  components,  the  first  represents  the  ‘usual’  log-likelihood 
without  the  truncation  effect,  whereas  the  second  component  is  due  to  truncation.  Under 
the  assumption  of  normality,  the  ML  estimates  obtained  by  using  the  first  component  are 
the  same  as  the  WLS  estimates.  To  simplify  the  presentation,  we  first  derive  the  WLS 
estimates. 


2.1  The  weighted  least-squares  estimates 


The  WLS  estimates  of  the  parameters  are  obtained  by  minimizing 

Q  =  Y^iVi  -  0i)tJrl(2/i  “  0i)- 


i=l 


By  differentiating  Q  with  respect  to  ft,  ft  and  the  n’s,  using  the  chain  rule  and  noting 
from  (4)  that 


aft 


do. 


d0i 


-l  —  =t  —  =  i 

aft  ’  aft  *’  dTi  ’ 


(6) 


and  equating  the  derivatives  to  zero,  we  find,  using  (6),  that 

dQ 


t=  1 

n 


aft 


=  =  0’ 


i=l 


and,  for  i  =  1, . . . ,  n  —  1, 

8Q 


^-  =  itQ-l[(yi-yn)-(0i-0n))  =  O. 
OTi 
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Substitution  for  0*  in  terms  of  #>,  Pi  and  the  Tj’s  and  a  straightforward  simphfication  gives 

yi^Vi  =  n(ltn-1l)/0o  +  (Elt^“1^)A, 

i=1 

Vt'n-'y,  =  (V  l'tr'tOft  +  (Etsa-'t,)/?!  +  pi’n-'^n, 

i=l  i=l  i=l  *” 1 

and,  for  i  =  1, 

1*Q— 1  (2/*  -  yn )  =  1  tQ,~1{ti  —  tn)pi  +  1*0  ll{ri  -  r„). 


The  key  to  obtaining  a  solution  for  the  above  system  of  equations  is  to  first  obtain 
an  expression  for  Th  i  =  1,  in  terms  of  the  other  parameters,  substitute  it  in 

the  first  two  equations,  and  solve  for  po  and  P\.  The  WLS  estimates  of  Po,  Pi  and  t*  for 
i=  1  are 


Po 

Pi 


Ti 


_1_ 

na 


'£ita-lVi 


Li=l 


nal^l 


aELitSn-1^  -  EL(itn-13/i)(itfi-Hi) 
aZ?=itin-Hi  -  Er=x(ltn-1ti)(ltn-1ti) 

-  [l’fT1^  -  27)]  -  -  [1‘n-1^  -  *)]  A, 

Gi  ^ 


(7) 


where  a  =  1*0  11- 


Remark  1:  In  the  absence  of  fixed  subject  effects,  the  Tj’s,  these  normal  equations  are 
similar  to  the  “classical”  normal  equations  of  linear  regression.  The  main  difference  is 
that  all  k  observations  associated  with  the  ith  subject  are  represented  by  a  scalar  variable 
Ui  =  l*£I-12/j.  Similar  scalar  reductions  are  obtained  for  the  tj’s.  Thus  repeated  measures 
regression  can  be  treated  essentially  as  ordinary  simple  regression  with  a  change  of  variables. 

Remark  2:  The  case  that  U  =  tjl  +  Ae  represents  successive  observations  taken  at 
equally  spaced  intervals  of  length  A  for  each  subject.  Let  e*  =  (0, 1, . . . ,  (k  -  1)).  In  this 
case,  (7)  simplifies  to 

T  a  S?=i  -  (l^^e) 

naA(etCl~1e  —  a) 


Remark  3:  For  k  =  3,  if  0  has  AR(1)  structure,  then  (7)  simplifies  to 


EjU  (Vsi  Vn) 

2nA 
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This  special  case  was  first  obtained  by  Michalek  et  al.  (1996a).  Further,  they  obtained  the 
same  expression  when  ft  is  a  Toeplitz  matrix  of  order  3.  The  expression  for  ft  for  larger 
values  of  k  when  ft  is  either  an  AR(1)  or  Toeplitz  involves  elements  of  ft. 

Remark  4  The  WLS  estimate  of  ft  is  biased  because  truncated  values  of  y*  are  used  in 
the  estimate.  In  general,  the  bias  is 

bias  (ft)  a^=1t*Q-iti-£!L1(ltft-1ti)(ltft-1ti)  ’ 

where  Hi  =  Hi(a-Oi )  is  the  multivariate  hazard  vector  defined  in  McGill  [(1992),  equation 
(9)].  This  bias  was  obtained  by  Michalek  et  al.  (1996b)  for  the  special  case  when  k  -  3 
and  ft  has  an  AR(1)  structure.  However,  the  formula  for  the  bias  depends  on  ft.  In  the 
absence  of  an  unbiased  estimate  of  ft  it  is  difficult  to  assess  the  accuracy  of  the  estimate 
of  this  bias.  ML  estimation,  discussed  below,  adjusts  for  truncation  through  the  second 
expression  of  equation  (8). 


2.2  Maximum  likelihood  estimates 

The  WLS  estimates  were  obtained  by  ignoring  the  second  component  of  the  likelihood 
expression  in  (5).  The  ML  estimates  on  the  other  hand  use  the  entire  likelihood,  taking 
into  account  the  truncation  component  as  well.  It  can  be  verified  that 

-  n-\y.  -  et)  -  sr'Efe,,  -  e,  I  y,  >  a), 

9ft 

where  E (y4  -  ft  |  Vi  >  a)  denotes  the  conditional  expectation  of  y{  -  ft  given  y{  >  a. 
Consequently,  the  ML  estimates  of  ft,  ft  and  the  n’s  are  solutions  of  the  system  of  equations 

=  £  1‘ft"1  (Vi  -  ft)  -  £  l*ft_1E(j/i  -  ft  |  Vi  >  a)  =  0, 

i= 1  *=1 

=  -  ft)  -  E tp-'EiVi  -  ft  I  Vi  >  a)  =  0,  (8) 

i=l  »=1 

1, 

l'ft'MC Vi  -Vn)-  (Oi  -On)] 

-ltft-1[E(yi  -  ft  |  y{  >  a)  -  E (yn  -0n\yn>  a)]  =  0. 


dlnL 

dTi 


5  In  L 
9ft 
91nL 
9ft 


The  second  components  of  (8)  introduce  non-linearity.  A  solution  is  obtained  by  an 
iterative  procedure,  described  below. 
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Step  1.  Use  WLS  estimates  of  the  parameters  as  the  initial  estimates  A>( 0), 
ft  (0)  and  Tj(0),  i  =  1, . . . , n  -  1,  and  obtain  0i(O). 

Step  2. 

(a)  Evaluate4  E  [y{  -  ft(0)  |  y4  >  o]  for  each  value  of  i. 

(b)  Substitute  these  expected  values  in  (8)  to  get  a  new  set  of  equations 
which  are  similar  to  the  equations  for  the  WLS  estimates  with 
replaced  by 

y*  =  Vi  +  E  [vi  -  0i{ 0)  |  Vi>  a] 

for  each  value  of  i.  Using  y*  in  place  of  in  (7)  obtain  /?o(l), Pi{l) 
and  fj(l)  for  i  =  1, . . .  ,n  —  1,  and  04(1). 

Step  3.  Repeat  Step  2  with  ft(0)  replaced  by  ft(l).  Repeat  this  procedure 
until  the  difference  between  two  consecutive  estimates  is  negligible. 

Our  ML  procedure  is  similar  to  the  EM  algorithm.  Navidi  (1997)  gives  a  simple 
graphical  illustration  of  the  EM  algorithm.  In  the  classical  EM  algorithm,  the  expectation  of 
the  unobservable  component  of  the  log  likelihood  function  is  obtained,  whereas  we  estimate 
the  conditional  expected  value  by  the  most  recent  estimate  of  the  unknown  parameter  0*. 


3  Estimates  of  Information  Matrix  and  the  Variance 
of  ML  Estimates 


The  ML  estimates  obtained  in  the  previous  section  fall  into  the  standard  maximum  like¬ 
lihood  class.  Properties  of  the  estimates  follow  from  standard  results.  The  asymptotic 
distribution  of  ft  and  A  is  bivariate  normal  with  covariance  given  by  the  inverse  of  the 
Fisher  information  matrix.  The  information  matrix  is  estimated  by  evaluating  the  negative 
of  the  second  derivative  of  the  log-likelihood  at  the  ML  estimates,  denoted  by  X.  For  our 
model  this  matrix  is  an  (n  +  1)  x  (n  +  1)  matrix.  To  obtain  the  elements  of  this  matrix  we 
use  the  chain  rule  of  differentiation  and  equation  (6).  To  this  end,  we  find  that 

30i0i 

4A  method  to  evaluate  these  and  similar  conditional  expectations  is  given  in  the  Appendix. 
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where  \P(a,  6{)  denotes  the  conditional  covariance  matrix  of  y*  -  0i,  given  2/i  >  a •  That  1S> 

(a,  0$  =  E  [(yt  -  0i){yi  -  0iY  |  Vi  >  a]  -  E  [(yf  -  0i)  I  Vi  >  «]  E  [(ifc  -  «<)*  I  Vi  >  a]  • 


Application  of  (6)  gives 


d2lnL 
a2  In  L 

'  a/?2 

a2lnL 

a^0a/?i 


i=l 

i=l 

1=1 


Likewise,  for  i  =  1, . . . ,  n 

a2  In  L 
drf 
a2  In  L 

dridTj 


-  1,  j  =  1, 

=  ltQ“1^r(a,  0„)fl_1l 


a2lnL 

dfodTi 
a2  In  L 

dfiidTi 


1‘JT1 


lTT^a,  -  l'Q-1#  (a,  0„)fi  Hn 


Expressions  containing  conditional  moments  can  be  evaluated  by  numerical  integration. 
The  case  that  k  =  3  and  Q  has  an  AR(1)  structure  has  been  described  in  detail  in  the 
Appendix.  Due  to  the  special  nature  of  the  information  matrix,  shown  below,  its  inversion 
can  be  easily  computed.  We  write  X  in  block  form  as 


where  Xn  represents  the  information  expression  associated  with  (30  and  /?i,  X<n  represents 
the  information  expression  associated  with  t*,  for  i  =  1, . . .  ,n  - 1,  and  Xx2  =  X\Y  represents 
the  cross-information  expressions.  Then, 


X22  —  •  •  •  5  Qn— l]  QnXn— lljv— 1> 

where  Qi  =  and  ln-i  denotes  an  (n  -  l)-dimensional  vector  of  all  l’s. 

Using  this  special  property  of  J22  and  the  block  inverse  approach  described  in  problems  2.7 
and  2.8  in  Rao  (1973),  the  asymptotic  variances  and  covariances  can  be  estimated  for  all 
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parameters.  In  other  words,  the  numerical  inversion  of  J,  whose  size  is  (n  + 1)  x  (n  + 1),  is 
not  necessary;  an  estimate  of  the  asymptotic  covariance  matrix  only  requires  the  inversion 

of  a  2  x  2  matrix. 

The  Bootstrap  Method: 

An  alternative  to  the  matrix  inversion  method  of  covariance  estimation  is  the  boot¬ 
strap.  The  matrix  inversion  method  depends  second  order  conditional  moments,  whereas 
the  bootstrap  method  is  computationally  intensive.  The  bootstrap  method  will  be  studied 

in  a  future  report. 


4  Discussion 


In  order  to  obtain  the  variances  of  the  intercept  and  slope  estimators  we  needed  to  invert 
the  information  matrix  (which  in  our  case  was  a  241  x 241  matrix).  However,  we  made  use  of 
the  special  structures  which  reduced  the  inversion  problem  into  inverting  a  2  x  2  matrix.  A 
comparison  of  the  ML  estimate  of  dioxin  half-life  and  it’s  standard  deviation  with  the  WLS 
estimate  and  standard  deviation  indicates  that  the  ML  estimate  of  the  half-life  is  about  1 
year  shorter  (7.3  years)  and  the  standard  deviation  of  the  ML  estimate  is  slightly  larger 
than  that  of  the  WLS  method.  Although  we  have  concentrated  on  a  special  covariance 
structure,  our  results  are  general  and  can  be  used  for  other  covariance  structures  as  well. 
The  complexity  of  the  problem  is  essentially  in  the  numerical  evaluation  of  the  conditional 
expectation  needed  in  Step  (2)  of  the  ML  procedure.  The  regression  model  can  be  extended 
to  include  other  covariates.  If  there  is  no  truncation  then  standard  software  will  provide 
the  WLS  or  ML  estimates  and  their  standard  errors.  If  the  observations  are  truncated,  the 
ML  method  will  require  special  purpose  programs.  In  the  context  of  truncated  observations 
the  WLS  estimates  are  generally  biased  and  the  maximum  likelihood  provides  more  reliable 
estimates,  at  least  when  the  sample  size  is  large.  If  the  sample  size  is  small  the  ML  estimates 
can  be  unreliable,  so  one  may  resort  to  procedures  such  as  the  bootstrap  and  the  jackknife, 
whereas  the  WLS  estimates  are  unbiased  regardless  of  the  sample  size. 

Even  though  the  number  of  equations  to  be  solved  is  minimal,  because  of  the  trun¬ 
cation,  the  equations  are  non-linear  in  the  parameters.  Therefore  they  must  be  solved 
iteratively. 
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Appendix 


We  derive  the  conditional  expected  value  and  variance  of  yi}  given  that  it  takes 
values  larger  than  a.  We  assume  that  Cl  has  AR(1)  structure.  In  the  Mowing  development, 
for  ease  of  presentation  and  without  loss  in  generality,  we  drop  the  suffix  i  from  y{  and  the 
associated  mean  For  convenience  we  consider  the  case  a 1  =  a(l,l,...,l),  the  general 
case  follows  similarly.  It  is  convenient  to  derive  these  results  in  terms  of  centered  random 
variable  z  —  y  —  0  •  The  condition  y  >  <z  becomes  z  >  a  0  • 

Case  1  k  =  3.  We  assume  that  the  three  dimensional  random  variable  2  is  normally 
distributed  with  mean  0  and  covariance  matrix  Cl  which  satisfies  the  AR(1)  assumption. 
Then,  the  conditional  distributions  of  Z\  and  z3  given  z 2  are  stochastically  independent. 
Consequently,  the  joint  distribution  of  z  can  be  written  as  a  product  of  three  normal 
densities, 

f(z)  =  (2?r)-i  |  Cl  \~i  exph^fT1  z] 

=  /l(zi  |  Z2)  x  /3(z3  I  Z2)  X  /2(z2),  (!) 

where 

.  .  I  .  1  r  (Zl  ~  pZ2f^ 

/l(Zl|Z2)  _  ^(1  -  J)  eXPl  20  -  P2)  ’ 

x  1  _r  fa-pzi)2, 

/s(Z3|Z2)  _  ^(1-7)  [  2(1 — p2) 

with  <fr(-)  representing  the  density  function  of  the  standard  normal  random  variable.  This 
allows  us  to  express 

too  too  too 

S(a,0)  =  /  /  /  f(y,0)dy 

Ja  Ja  Ja 

too 

-  /  si(a,z2)  s3(a,z2)  0(z2)  dz2, 

Ja—6  2 

where  for  j  =  1,3,  Sj( a,z2)  =  /~#j  /,(*,-  I  22)  dz,  =  1  -  $[((a  -  0S)  -  pzsWl  -  A 
where  3>  is  the  standard  normal  distribution  function.  Thus,  S(a,0)  can  be  evaluated  as  a 
one-dimensional  integral  in  z2.  Using  (1)  we  now  derive  the  first  two  conditional  moments. 


39 


Conditional  expectations:  We  consider  E [zj  \  z  >  a  -  0],  j  -  1,2,3.  In  view  of 
equation  (1)  these  quantities  are  reducible  to  one-dimensional  integrals  as  shown  below. 
The  evaluation  of  E  [z2  \z>  a -9]  is  straightforward, 

Efe  !»>“-«!  = 

=  S~l(a,0)  [  Si(a,  z2)s3(a,  z2)z2f2(z2)dz2. 

Jo—02 


It  is  easy  to  verify  that  for  j  —  1, 3, 


|  z2)dzj  =  [-\/l  —  P2hj(a,  z2)  +  pz2]sj(a,  z2) 

=  'ipj(a,z2)sj(a,z2), 


(2) 


where 


hj(a,z2 )  =  <p{-  -j==  =jpi)/sj{a,z2). 


Consequently,  for  j  =  1,3, 

/*oo 

E  [zj  \  z>  a  — 0]  =  iS_1(a,  9)  /  Si(a,  z2)s3(a,  z2 )  ipj(a,  z2 )  /2(z2)dz2. 

J  CL— 62 

Conditional  covariances:  The  conditional  covariance  of  z  is 


Cov[z  \  z>  a -0]  =  E  [zz*  \  z  >  a  -  0)  -  E[z  \  y  >  a]  E*[z  |  z  >  a  -  0]. 

The  matrix  E[zz4  \  z  >  a  —  9]  has  six  distinct  elements,  all  of  which  can  be  reduced  to 
one-dimensional  integrals.  Clearly, 

E[z%  I  z  >  a  —  0}  =  S-1(a,0)  [.[.[,  zlf(z)&z 

1  *  Ja—Oi  J 0.-62  Jo— $3 

=  S-1(a,0)  f  si(a,z2)  53(0^2)  zl<p(z2)dz2. 

Ja—0  2 


Using  (2)  it  follows  that  for  j  =  1,3, 

E[zj  z2  |  z  >  a  —  0}  =  <S'-1(a,  9)  J  Si(a,  z2)  s3(a,  z2)  -ipj(a,  z2)z2f2(z2)dz2, 

and 

E[zxz3  I  z  >  a  —  0]  =  S~1(a,0)  (  si(a,z2)  sz{a,z2)'ipi(a,z2)il)Z(a,z2)f2{z2)dz2. 

J  CL 

To  evaluate  E(z?  |  z  >  a  —  0),  we  need  the  following  intermediate  result.  Using  integration 
by  parts,  we  obtain 


|  z2)  dzj  —  tj(a,z2)sj(a,z2), 
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where 


tj(a, z2)  =  (1  -  p2)  +  p24  +  (1  "  p2)1/2M°>  «*)(«  ~  eo  +  /**)• 

Hence, 

E\z2  \z>  a -6}  =  S'1  {a,  9)  f°°  si(a,  z2)  s3(a,  z2)  tj(a,  z2)  <f>(z2)dz2. 

L  J  Ja-9 2 

Case  2  fc  =  4.  The  joint  density  of  z  =  y  —  6  can  be  written  as 

f(z)  =  /l.23(zi  I  Z2,Z3)f4.23{Z4  \  Z2,  Z3)f23(z2,  Z3), 

where  the  /’s  denote  normal  density  functions.  Consequently  the  first  and  second  moments 
of  the  Zi  s  can  be  written  as  two-dimensional  integrals. 

Case  3  k  >  5.  The  approach  taken  above  can  be  applied  in  this  case  also.  For  example, 
when  A:  =  5  it  can  be  verified  that  the  joint  density  of  z  =  y  —  9  satisfies 

f(z)  —  /l.234(zl  I  Z2,  Z3,  Z4)f5.23i(Z5  |  Z2,  Z3,Z4)f2.3(z2  |  Z3)f^.3{z^  \  Z3)f3(z3). 

Thus  the  survival  function  and  conditional  moments  can  be  reduced  to  three-dimensional 
integrals. 
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